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Abstract — Convexity is a key concept in information theory, 
namely via the many implications of Jensen's inequality, such 
as the non-negativity of the Kullback-Leibler divergence (KLD). 
Jensen's inequality also underlies the concept of Jensen-Shannon 
divergence (JSD), which is a symmetrized and smoothed version 
of the KLD. This paper introduces new JSD-type divergences, 
by extending its two building blocks: convexity and Shannon's 
entropy. In particular, a new concept of g-convexity is introduced 
and shown to satisfy a Jensen's g-inequality. Based on this 
Jensen's g-inequality, the Jensen-Tsallis g-difference is built, 
which is a nonextensive generalization of the JSD, based on 
Tsallis entropies. Finally, the Jensen-TsaUis g-difference is chara- 
terized in terms of convexity and extrema. 

Index Terms — Convexity, Tsallis entropy, nonextensive en- 
tropies, Jensen-Shannon divergence, mutual information. 

I. Introduction 

The central role played by the Shannon entropy in infor- 
mation theory has stimulated the proposal of several gener- 
alizations and extensions during the last decades (see, e.g., 
[1], [2], [3], [4], [5], [6], [7]). One of the best known of 
these generalizations is the family of Renyi entropies, which 
has the Shannon entropy as a limit case [1], and has been 
used in several applications {e.g., [8], [9]). The Renyi and 
Shannon entropies share the well-known additivity property, 
under which the joint entropy of a pair of independent random 
variables is simply the sum of the individual entropies. In 
other generalizations, namely those introduced by Havrda- 
Charvat [2], Daroczi [3], and Tsallis [7], the additivity property 
is abandoned, yielding the so-called nonextensive entropies. 
These nonextensive entropies have raised great interest among 
physicists in modeling certain physical phenomena (such as 
those exhibiting long-range interactions and multifractal be- 
havior) and as a framework for nonextensive generalizations 
of the classical Boltzmann-Gibbs statistical mechanics [10], 
[11]. Nonextensive entropies have also been recently used in 
signal/image processing (e.g., [12], [13], [14]) and many other 
areas [15]. 
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Convexity is a key concept in information theory, namely 
via the many important corollaries of Jensen's inequality [16], 
such as the non-negativity of the relative Shannon entropy, 
or Kullback-Leibler divergence (KLD) [17]. The Jensen in- 
equality is also at the basis of the concept of Jensen-Shannon 
divergence (JSD), which is a symmetrized and smoothed 
version of the KLD [18], [19]. The JSD is widely used in 
areas such as statistics, machine learning, image and signal 
processing, and physics. 

The goal of this paper is to introduce new extensions of 
JSD-type divergences, by extending its two building blocks: 
convexity and the Shannon entropy. In previous work [?], we 
investigate how these extensions may be applied in kernel- 
based machine learning. More specifically, the main contribu- 
tions of this paper are: 

• The concept of (/-convexity, as a generalization of con- 
vexity, for which we prove a Jensen q-inequality. The 
related concept of Jensen q-differences, which generalize 
Jensen differences, is also proposed. Based on these 
concepts, we introduce the Jensen-Tsallis q-difference, a 
nonextensive generalization of the JSD, which is also a 
"mutual information" in the sense of Furuichi [20]. 

• Characterization of the Jensen-Tsallis g-difference, with 
respect to convexity and its extrema, extending results 
obtained by Burbea and Rao [21] and by Lin [19] for the 
JSD. 

The rest of the paper is organized as follows. Section Hl-AI 
reviews the concepts of nonextensive entropies, with emphasis 
on the Tsallis case. Sectionllllldiscusses Jensen differences and 
divergences. The concepts of g-differences and g-convexity 
are introduced in Section IIVI where they are used to define 
and characterize some new divergence-type quantities. Section 
rvl defines the Jensen-Tsallis (/-difference and derives some 
properties. Finally, Section [Vl] contains concluding remarks 
and mentions directions for future research. 

II. NONEXTENSIVE ENTROPIES 

A. Suyari's Axiomatization 

Inspired by the Shannon-Khinchin axiomatic formulation of 
the Shannon entropy [22], [23], Suyari proposed an axiomatic 
framework for nonextensive entropies and a uniqueness theo- 
rem [24]. Let 



A"-i := I (pi, . . . ,p„) e M" : p, > 0,^K = 1 



(1) 



denote the {n — 1) -dimensional simplex. The Suyari axioms 
(see Appendix) determine the function 5^.^ : A"^^ — > R 
given by 



Sq.APl^---^Pn) 



k 



Hi) 



1-E^; 




where q,k £ M+, 5i,0 := limg_^i Sq^^, and 



continuous function satisfying the following three conditions: 

(i) (t>{q) has the same sign as g— 1; 

(ii) (t>{q) vanishes if and only if g = 1; 

(Hi) (j) is differentiable in a neighborhood of 1 and 
^'(l) = 1. 
For any cj) satisfying these conditions, Sq^,f, has the pseudoad- 
ditivity property: for any two independent random variables 
A and B, with probability mass functions pA G A"-* and 
Pb G A"^, respectively. 



Sq,4,{Ay. B) ^ Sq,^{A) + Sq.,^{B) 



m 

k 



Sq,,p{A)Sq,tl,{B) 



where we denote (as usual) Sq,ci,{A) := Sq,ti,{pA)- 
For g = 1, we recover the Shannon entropy, 

n 

Si,4>{pi,-.-,Pn) ^ H{pi,...,pn) = -k^pilnpi, (3) 

i=l 

thus pseudoadditivity turns into additivity. 

B. Tsallis Entropies 

Several proposals for have appeared [2], [3], [7]. In the 
rest of the paper, we set 0(g) = g — 1, which yields the Tsallis 
entropy: 



Sq{pi, ■ ■ ■ ,Pn) 



T '-&? 



(4) 



To simplify, we let /c == 1 and write the Tsallis entropy as 

Sq{X) := Sq{pi,... ,Pn) = - ^ p{xY \nqp{x), (5) 

x&X 

where h\q{x) :— (x^^^ — 1)/(1 ^ 9) is the q-logarithm 
function, which satisfies h\q{xy) — Ing(x) + x^^'^hiq{y) and 
\riq{\/x)^-x'i-'^\nq{x). 

Furuichi derived some information theoretic properties of 
Tsallis entropies [20]. Tsallis yo/«f and conditional entropies 
are defined, respectively, as 



Sq{XX) := -^P{x,y)''lnqp{x,y) 



(6) 



x,V 



and 



Sq{X\Y) := -^p(a:,yrHp(x|2/) 

x,y 

= J2p{yrSq{X\y), (7) 

y 
and the chain rule Sq{X,Y) = SqiX) + Sq{Y\X) holds. 

For two probability mass functions px, Py G A", the 
Tsallis relative entropy, generalizing the KLD, is defined as 

Prix) 



Dq{px\\PY) ■.= -^pxix)lnq 



Pxix)' 



(8) 



Finally, the Tsallis mutual entropy is defined as 

Iq[X-Y) :- Sq{X) - Sq{X\Y) = Sq{Y) - Sq{Y\X), (9) 

generalizing (for g > 1) Shannon's mutual information [20]. In 
Section |V] we establish a relationship between Tsallis mutual 
entropy and a quantity called Tsallis q-difference, generalizing 
the one between mutual information and the JSD [25]. 

Furuichi considers an alternative generalization of Shan- 
non's mutual information. 



iq{X;Y) ■.= Dq{px,Y\\px®PY). 



(10) 



where px.Y is the true joint probability mass function of 
[X, Y) and px ® Py denotes their joint probability if they 
were independent [20]. This alternative definition has also 
been used as a "Tsallis mutual entropy" [26]; notice that 
Iq{X\Y) 7^ Iq{X\Y) in general, the case g = 1 being a 
notable exception. In Section |V] we show that this alternative 
definition also leads to a nonextensive analogue of the JSD. 

C. Denormalization of Tsallis Entropies 

In the sequel, we extend the domain of Tsallis entropies 
from A„_i to the set of unnormalized measures, M" ;= 
{{xi, . . . , Xn) G M" \Vixi > 0}. The Tsalhs entropy of a 
measure is defined as 



Sq{xi,...,Xn) := -^xjlUgXi = ^(y5g(Xj), (11) 

4=1 1 = 1 

is given by 

-y\ny, if g = 1, 

(y-y«)/(9-i), if 9^1- 

(12) 



where (pq : M^. - 



III. Jensen Differences and Divergences 

A. The Jensen Difference 

Jensen's inequality states that, if / is a concave function 
and X is an integrable real-valued random variable, 

f{E[X])~E(J{X))>Q. (13) 

Burbea and Rao studied the difference in the left hand side 
of ( fT3] l, with / :— H^, where H^ : [a, 5]" ^ M is a concave 
function, called a (p-entropy, defined as 



H^{x) := -^lp{x^), 



(14) 



where (^9 : [a, 6] ^ M is convex [21]. The result is called the 
Jensen difference, as formalized in the following definition. 

Definition 1: The Jensen difference J| : R'™ -^ R 
induced by a (concave) generalized entropy ^ : R" -^ R 

and weighted by (tti, . . . , 7r,„) G A™^^ is 

(m \ 771 

y^ TTj Xj \ - ^ TTj'i'iXj ) 

= ^iE[X])-E[^iX)], (15) 

where both expectations are with respect to (tti, . . . , Tr™). 

In the following subsections, we consider several instances 
of Definition [T] leading to several Jensen-type divergences. 



B. The Jensen-Shannon Divergence 

Let P be a random probability distribution taking values 
in {py}y=i^....m Q A"^^ according to a distribution tt = 
(tti, . . . , TTm) e A'"~^. (In classification/estimation theory 
parlance, it is called the prior distribution and py := p{-\y) 
the likelihood function.) Then, (flST l becomes 



Jl{pi 



,p,^)^-^{E[P])-E[^{P)], (16) 



where the expectations are with respect to it. 

Let now 'i' = H, the Shannon entropy. Consider the 
random variables Y and X, taking values respectively in 
3^ = {1, . . . , m} and X = {1, . . . , n}, with probability mass 
functions 7r(y) :— -Hy and p{x) :— '}2^y^iP{x\y)Ti{y). Using 
standard notation of information theory [17], 



r{P):=Jl{j>,,. 



,Pra) = H{X)-H{X\Y)(n) 

= I{X;Y), 

where I{X;Y) is the mutual information between X and Y. 
Since I{X;Y) is also equal to the KLD between the joint 
distribution and the product of the marginals [17], we have 

r{P) = H {E[P]) - E[H{P)] = E[D{P\\E[P])]. (18) 

The quantity J]^(pi, . . . ,Pm) is called the Jensen-Shannon 
divergence (JSD) of pi, . . . ,Pm, with weights tti, . . . , TTm [21], 
[19]. Equality dTsl ) allows two interpretations of the JSD: (i) 
the Jensen difference of the Shannon entropy of P; or (ii) the 
expected KLD between P and the expectation of P. 

A remarkable fact is that J'^{P) = minQ E[D{P\\Q)], 
i.e., Q* — E[P] is a minimizer of E[D{P\\Q)] with respect 
to Q. It has been shown that this property together with 
equality ( fTsT i characterize the so-called Bregman divergences: 
they hold not only for "^ = H, but for any concave ^> and the 
corresponding Bregman divergence, in which case J^ is the 
Bregman information (see [27] for details). 

When 771 = 2 and tt = (1/2, 1/2), P may be seen as a 
random distribution whose value on {pi,P2} is chosen by 
tossing a fair coin. In this case, j(i/2^i/2)(p) — JS{pi,p2), 
where 

JS(pi,P2) = H' ' 



Id[p, 



Pi +P2 



\d\p2 



Pi +P2 



(19) 



as introduced in [19]. It has been shown that y/JS satisfies the 
triangle inequality (hence being a metric) and that, moreover, 
it is an Hilbertian metric [28], [29]. 

C. The Jensen-Renyi Divergence 

Consider again the scenario above (Subsection IIII-Bb . now 
with the Renyi (/-entropy 



1 



R<i{p) = — 



1=1 



(20) 



replacing the Shannon entropy. The Renyi g-entropy is con- 
cave for q e [0, 1) and has the Shannon entropy as the limit 
when q ^ \ [\]. Letting '^ = Rq, ( fTSI l becomes 

JS (pi, . . . ,p„) = Rq {E[P]) - E[Rq{P)]. (21) 



Unlike in the JSD case, there is no counterpart of equality ( fTSl ) 
based on the Renyi g-divergence 



1 " _ 

Db.^{Pi\\P2) = -—-^In'^plipl^ «. 



(22) 



The quantity J]^ in (ISTT l is called the Jensen-Renyi diver- 
gence (JRD). Furthermore, when rn — 2 and vr — (1/2, 1/2), 
we write J'^ (P) ~ JRq{pi,p2), where 

Ji?,(p„p.)^P,f^i±Z£)_ M^lht^iM. (23) 



The JRD has been used in several signal/image processing 
applications, such as registration, segmentation, denoising, and 
classification [30], [31], [32]. 

D. The Jensen-Tsallis Divergence 

Burbea and Rao have defined divergences of the form ( fTSl ) 
based on the Tsallis (/-entropy Sq, defined in ( fTTT i [21]. Like 
the Shannon entropy, but unlike the Renyi entropies, the Tsallis 
g-entropy is an instance of a (^-entropy (see (fT4]l). Letting 
* = Sq, (O becomes 

Jlipi, ...,Pm)^Sq iE[P]) - E[Sq{P)]. (24) 

Again, like in Subsection IIII-CI if we consider the Tsallis q- 
divergence, 



DM\\P2) = 



1 



1-q 



V, 1=1 y 



(25) 



there is no counterpart of the equality (fTSl l. 

The quantity Jg in (l24l i is called the Jensen-Tsallis diver- 
gence (JTD) and it has also been applied in image processing 
[33]. Unlike the JSD, the JTD lacks an interpretation as a 
mutual information. In spite of this, for q e [1,2], the JTD 
exhibits joint convexity [21]. In the next section, we propose 
an alternative to the JTD which, amongst other features, is 
interpretable as a nonextensive mutual information (in the 
sense of Furuichi [20]) and is jointly convex, for q G [0, 1]- 

IV. g-CONVEXITY AND q-DlFFERENCES 
A. Introduction 

This section introduces a novel class of functions, termed 
Jensen q-differences (JqD), that generalizes Jensen differences. 
We will later (Section |V]l use the JqD to define the Jensen- 
Tsallis q-difference (JTqD), which we will propose as an 
alternative nonextensive generalization of the JSD, instead of 
the JTD discussed in Subsection IIII-DI 

We begin by recalling the concept of g-expectation, which 
is used in nonextensive thermodynamics [7]. 

Definition 2: The unnormalized q-expectation of a finite 
random variable X £ X, with probability mass function 
Px{x), is 

S,[X]:=^xPx(x)«. (26) 

xex 

Of course, q = 1 corresponds to the standard notion of 
expectation. For q ^ I, the g-expectation does not correspond 



to the intuitive meaning of average/expectation (e.g., _Bg[l] 7^ 1 
in general). Nonetheless, it has been used in the construction 
of nonextensive information theoretic concepts such as the 
Tsallis entropy, which can be written compactly as Sq{X) ~ 
-E,[ln,p{X)]. 

B. q-Convexity 

We now introduce the novel concept of q-convexity and use 
it to derive a set of results, among which we emphasize a q- 
Jensen inequality. 

Definition 3: Let g G K and A" be a convex set. A function 
/ : A" -^ M is g-convex if for any x,y ^ X and A e [0, 1], 



J{\x + (1 - \)y) < XI fix) + (1 - A)V(y)- 



(27) 



Naturally, / is g-concave if — / is q-convex. Of course, 
1 -convexity is the usual notion of convexity. The next propo- 
sition states the q-Jensen inequality. 

Proposition 4: If / : A" — > R is g-convex, then for any 

n G N, Xi, . . . ,Xn G X and tt ~ (vri, . . . ,7r„) G A"^-^, 

/(jZ^.a;,) <^^,^/(a;,)- (28) 

Proof: Use induction, exactly as in the proof of the 

standard Jensen inequality (e.g., [17]). ■ 

Proposition 5: Let / > and q> q' >0; then, 

/is q-convex ^ /is q' -convex (29) 

—/is g'-convex ^ —./is g-convex. (30) 

Proof: Implication ( [29] l results from 

f{Xx+{l-X)y) < AV(a:) + (1 - A)V(y) 
< X'''f{x) + {l-Xy'f{y), 

where the first inequality states the q-convexity of / and the 
second one is valid because f{x), f{y) > and i^ > f^ > 0, 
for any t G [0, 1] and q > q'. The proof of dSOl l is analogous. 



C. Jensen q-Differences 

We now generalize Jensen differences, formalized in Defi- 
nition [T] by introducing the concept of Jensen q-differences. 

Definition 6: For g > 0, the Jensen q-difference induced by 
a (concave) generalized entropy vj/ : M^ -^ M and weighted 

by (^i,...,x^)G A™-Ms 

= ^>{E[X])-E,[m{X% (31) 

where the expectation and the q-expectation are with respect 
to (tti,.. .,7r,„). 

Burbea and Rao established necessary and sufficient con- 
ditions for the Jensen difference of a (^-entropy to be convex 



[21]. The following proposition generalizes that result, extend- 
ing it to Jensen q-differences. 

Proposition 7: Let 1^9 : [0, 1] ^ M be a function of class 
C^ and consider the ((/^-entropy [21]) function ^ : [0, 1]" -^ 
R defined by ^(z) := — X]r=i vi^i)- Then, the q-difference 
T!^^ : [0, 1]"™ -^ R is convex if and only if ^p is convex and 
— l/t/j" is (2 — q)-convex. 

Proof: The case q = \ corresponds to the Jensen 
difference and was proved by Burbea and Rao (Theorem 1 
in [21]). Our proof extends that of Burbea and Rao to g 7^ 1. 

In general, y = {yi, ...,y.m}, where yt = {yn, -■,2/tn}, thus 



K^iv) = * E 



T^tyt 



E^**(y* 



E 



E '^tfivti) - v' ( E '^^y*' 

1=1 Lt=i \t=i 

showing that it suffices to consider n = 1, i.e., 

m / m \ 

Ti^iyi, . . . , y™) = E ^t' v(yt) ~ "^ E ^^y* ) ' ^^^^ 
t=i \t=i / 

this function is convex on [0, 1]™ if and only if, for every fixed 

ai, . . . , a„i G [0, 1], and 61, ... , &,„ G R, the function 

f{x) = TJ^^(ai + 61a;, . . . , a,„ + brnx) (33) 

is convex in {x G R : at + bfX G [0, 1], t = 1, . . . , m}. Since 
/ is C^, it is convex if and only if f"{t) > 0. 

We first show that convexity of / (equivalently of T!^^) 
implies convexity of tp. Letting ct — at + btx, 

m / m \ / m \ 

fix) = E < b^t f"{ct) -[Y.^tbA <^" E ^* ^0 • 
t=i \t=i I \t=\ ) 

(34) 
By choosing a; = 0, at = a G [0,1], for t = l,...,m, and 
61, . . . , 5„ satisfying J^t ^t^t = in (O, we get 

f"{Q) = ^"{a)Y,^1bl 
t=i 

hence, if / is convex, ip"{a) > thus ip is convex. 

Next, we show that convexity of / also implies (2 — q)- 
convexity of —\/ip". By choosing x = {) (thus ct = at) and 

bt = ■n]^'^{^"{at)y^, we get 



2-9 



/"(o) - E;57ZT- E;5 



r?-^ 



1 </'"(«*) \t{ V"{cit) 








'P" {T,T=iT^tat) f^^'P"{at) 



where the expression inside the square brackets is the Jensen 
(2— q)-differenceof l/(p" (see Definition |6]l. Since (p"{x) > 0, 
the factor outside the square brackets is non-negative, thus the 
Jensen (2 — q) -difference of l/(p" is also nonnegative and 

— l/(p" is (2 — q)-convex. 



Finally, we show that if tp is convex and — l/i/?" is 
(2 — (7)-convex, then /" > 0, thus T!^^ is convex. Let 
n = {qTil-''/^"{ct)fl^ and St = h{iTl^"{ct)lqY'^- then, 
non-negativity of /" results from the following chain of 
inequalities/equaUties: 



o^E^n E^? 




nst 



'fn 2 — q rn / m > 



(35) 



(36) 



\t=i 



< 



1 



1 



^b'tn!^"{ct)-[J2hnA (37) 
/"(i), (38) 



where: dSSl l is the Cauchy-Schwarz inequality; equality 
results from the definitions of rt and st and from the fact that 

rtst = btTTf, inequality (|37] | states the (2 — g)-convexity of 
-!/(/?"; equality (O results from (O. ■ 

V. The Jensen-Tsallis ^-Difference 

A. Definition 

As in Subsection IIII-BI let P be a random probability 
distribution taking values in {py}y=i,...,m according to a 
distribution tt = (tti, . . . , TTm) E A™^^. Then, we may write 



T:^{Pi, . . . ,P™) = * {E[P]) - E,[^{P)], 



(39) 



where the expectations are with respect to n. Hence Jensen q- 
differences may be seen as deformations of the standard Jensen 
differences (fTSI l, in which the second expectation is replaced 
by a q-expectation. 

Let now 'i' = Sq, the nonextensive Tsallis g-entropy. Intro- 
ducing the random variables Y and X, with values respectively 
in 3^ = {l,...,m} and X — {l,...,n}, with probability 
mass functions 7r(y) := TTy and p{x) :— Y^y''^iP{x\y)TT{y), 
we have (writing T^g simply as T^) 



Tnpi 



,p.ra) = S,{X) - S,{X\Y) = I,iX;Y), (40) 



where Sq{X\Y) is the Tsallis conditional q-entropy, and 
Ig{X;Y) is the Tsallis mutual g-entropy, as defined by Fu- 
ruichi [20]. Observe that ( l40l l is a nonextensive analogue of 
([TtT i. Since, in general, Iq ^ Iq (see (fTOll). unless q = 1 
(/i ~ Ii — I), there is no counterpart of ( fTSl l in terms of q- 
differences. Nevertheless, Lamberti and Majtey have proposed 
a non-logarithmic version of the JSD, which corresponds 
to using Iq for the Tsallis mutual q-entropy (although this 
interpretation is not explicitally mentioned by those authors) 
[26]. 

We call the quantity T^{pi, . . . ,Pm) the Jensen-Tsallis q- 
difference (JTqD) of pi,...,p,„ with weights tti, . . . , 7r„j. 
Although the JTqD is a generalization of the Jensen-Shannon 
divergence, for q ^ 1, the term "divergence" would be 
misleading in this case, since T^ may take negative values 
(if q < 1) and does not vanish in general if P is deterministic. 



When m = 2 and vr = (1/2, 1/2), define Tq := Tq 

'Pl+P2\ Sq{pi) + Sq{p2) 



1/2,1/2 



Tq{pi,P2) = 5*, 



29 



(41) 



Notable cases arise for particular values of q: 

. For g = 0, Sq{p) = — 1 + Ijxllo, where ||x||o denotes 
the so-called 0-norm (although it's not a norm) of vector 
X, i.e., its number of nonzero components. The Jensen- 
Tsallis 0-difference is thus 



ro(pi,p2) 



1 



bl0P2||o, 



(42) 



where denotes the Hadamard-Schur {i.e., elementwise) 
product. We call Tq the Boolean difference. 
. For q = 1, since Si{p) = H{p), Ti is the JSD, 



Ti{pi,P2) = JSipi,P2) 



(43) 



. For q = 2, S2{p) = l~(p,p), where {x, y) = Y,i Xi Vi is 
the usual inner product between x and y. Consequently, 
the Tsallis 2-difference is 



T2{pi,P2) = 2 ~ 2 ^Pi'^2) 
which we call the linear difference. 



(44) 



B. Properties of the JTqD 

This subsection presents results regarding convexity and 
extrema of the JTqD, for several values of q, extending known 
properties of the JSD (q = 1). 

Some properties of the JSD are lost in the transition to 
nonextensivity. For example, while the former is nonnegative 
and vanishes if and only if all the distributions are identical, 
this is not true in general with the JTqD. Nonnegativity 
of the JTqD is only guaranteed if q > 1, which explains 
why some authors {e.g., [20]) only consider values of q > 
1, when looking for nonextensive analogues of Shannon's 
information theory. Moreover, unless q = 1, it is not generally 
true that T^{p, ■ ■ ■ ,p) = or even that T^{p, . . .,p,p') > 
T^{p, ■ ■ ■ ,p,p). For example, the solution to the optimization 
problem 

min Tq{pi,p2), (45) 

PiGA" 

is, in general, different from p2, unless if g = 1. Instead, this 
minimizer is closer to the uniform distribution, if g € [0, 1), 
and closer to a degenerate distribution, for g e (1,2]. This is 



not so surprising: recall that T2{pi,P2) 



i(pi,P2); in 



this case, ( |45l l becomes a linear program, and the solution is 
not p2, but p\ ~ Sj, where j = argmaxiP2j- 

We start by recalling a basic result, which essentially 
confirms that Tsallis entropies satify one of the Suyari axioms 
(see Axiom A2 in the Appendix), which states that entropies 
should be maximized by uniform distributions. 

Proposition 8: The uniform distribution maximizes the 
Tsallis entropy for any g > 0. 

Proof: Consider the problem 

max Sq{p), subject to J^iPi = 1 ^"'^ Pi — 0- 



Equating the gradient of the Lagrangian to zero, yields 

^ is,{p) + A(E.K - 1)) = -lii - i)-vr' + A = 0, 

for all i. Since all these equations are identical, the solution 
is the uniform distribution, which is a maximum, due to the 
concavity of S'g. ■ 

The following corollary of Proposition |7] establishes the 
joint convexity of the JTqD, for q E [0, 1]. This complements 
the joint convexity of the JTD, for q E [1,2], which was proved 
by Burbea and Rao [21]. 

Corollary 9: For q E [0, 1], the JTqD is a jointly convex 
function on A"~^. Formally, let {Py }JZi"' „, be a collection 
of I sets of probability distributions on X ^ {I, . . . ,n}; then, 
for any (Ai, . . . , A,) G A'-i, 



where the equality in (|46] | results from Sq{6xt) — 0. Notice 
that this maximum may not be achieved if n < to. ■ 

The next proposition establishes (upper and lower) bounds 
for the JTqD, extending Corollary [TO] to any non-negative q. 



Proposition 11: For q> 0, 

Tq{pi,---,P„i) < Sqin), 



(48) 



and, if n > TO,, the maximum is reached for a set of disjoint 
degenerate distributions. As in Corollary [TO] this maximum 
may not be attained if n < m. 
For q> I, 



7T(Pl,...,Pm)>0, 



(49) 



/ 



t: 



-'^A.p« 



\i=l i=l / 2=1 

Proof: Observe that the Tsallis entropy (|5]) of a proba- 
bility distribution pt = {pti, ■■■,Ptn} can be written as 



and the minimum is attained in the pure deterministic case, 
i.e., when all distributions are equal to same degenerate 
distribution. Results (|48]l and (|49]l still hold when X and y 
are countable sets. 
For q E [0, 1], 



SqiPt) 



1=1 



(fipti), where ipq{x) = — 



X — x^ 



Tgipi, ■ . . ,Pm) > Sq{7r)[l - n 



1-«1 



(50) 



thus, from Proposition |7] T^ is convex if and only if ipq is 
convex and —1/ip'' is (2 — q)-conve\. Since (p''{x) = qx'^~^, 
ipq is convex for x > and q > Q. To show the (2 — q)- 
convexity of — l/(/?''(a;) = —{l/q)x'^^^, for xt > 0, and q E 
[0, 1], we use a version of the power mean inequality [34], 



This lower bound (which is zero or negative) is attained when 
all distributions are uniform. 

Proof: The proof of (|48T l, for q> 0, results from 



2-9 



I 

E 

i=l 



T^iPi 



1 



< - > (A, x,?-" = - T A?-" x?-«, 



[Xi Xi 



i-e(e-^^^)' 



4=1 



thus concluding that —1/ip'' is in fact (2 — q)-covNtx. ■ 

The next corollary, which results from the previous one, 
provides an upper bound for the JTqD, for q E [0,1]. Although 
this result is weaker than that of Proposition [TT| below, we 
include it since it provides insight about the upper extrema of 
the JTqD. 

Corollary 10: Let q E [0,1]. Then, TJ^(pi, . . . ,p„) < 




^.w+Ae 



E f-^- 



E^'^« 



< Sqi^r), 



(51) 



Proof: From Corollary |9l for q E [0, 1], T/!{pi 



,Pm) 



is convex. Since its domain is a convex polytope (the cartesian 
product of m simpleces), its maximum occurs on a vertex, 
i.e., when each argument pf is a degenerate distribution at xt, 
denoted 5xt ■ In particular, if n > to, this maximum occurs at 
the vertex corresponding to disjoint degenerate distributions, 
i.e., such that Xi ^ Xj if i ^ j. At this maximum. 



t;(4„...,4„) 




T^tSxt 



T^tSxt 



^TTtSg{6xJ 

f=l 



where the inequality holds since, for 2/i > 0: if g > 1, then 

E. y^ < (E. y^r■^ if i e [o, i], then E. vl > (E. y^)'- 

The proof that T^ > for q > 1, uses the notion 
of (/-convexity. For countable X, the Tsallis entropy dUi is 
nonnegative. Since —Sg is 1-convex, then, by Proposition |5] 
it is also g-convex for q > I. Consequently, from the g- Jensen 
inequality (Proposition |4|i, for finite 3^, with |3^| — m, 

/ m \ m 

Tq (pi, . . . ,Pm) = Sg E ^*P* ~ E '^tSqipt) > 0. 



\t=l 



t=l 



Sqin), 



(46) 
(47) 



Since Sq is continuous, so is T^, thus the inequality is valid 
in the limit as m ^ oo, which proves the assertion for y 
countable. Finally, Tg{6i, . . . ,6i) — 0, where 5i is some 
degenerate distribution. 



Finally, to prove < [50l t, for g G [0, 1] and A' finite, 

(m \ m 

t=l / t=l 

7n ni 

> J2''t^i(Pt)-J2''tS,{Pt) (52) 

t=l i^l 

771 



t=l 



> S,{U)J2i^,-n'^) 



t=i 



= ^^Wil-ni-"], 



(53) 



(54) 



where the inequality ( |52] i results from 5*5 being concave, and 
the inequality ( |53] l holds since tt^ — tt' < 0, for g e [0, 1], 
and the uniform distribution U maximizes Sq (Proposition [8]l, 

with S'g(C/) = (l-7ii-9)/(g-l). ■ 

Finally, the next proposition characterizes the convex- 
ity/concavity of the JTqD. As before, it holds more generally 
when y and X are countable sets. 

Proposition 12: The JTqD is convex in each argument, for 
q E [0, 2], and concave in each argument, for q > 2. 

Proof: Notice that the JTqD can be written as 

Tgipi, ■ ■ ■ ,Pm) = Ej i'iPlj, ■ ■ ■ ,Pmj), with 

ip{yi,---,ym) = (55) 



XI ('^'' ~ ^i)y^ + Yl ^ivi ^ Yl ^^yi 



g-1 



It suffices to consider the second derivative of ip with respect 
to yi. Introducing z = X)" 2 ^j Vi^ 






q q-2 



T^l (tti 2/1 + z) 



q-2 



qT^l [('Tiyi)'' ^ - (TTiyi +Z) 



q-2 



(56) 



Since tti j/i < (tti yi + z) < 1, the quantity in (l56i l is 
nonnegative for q e [0, 2] and non-positive for q > 2. ■ 

VI. Conclusion 

In this paper we have introduced new Jensen-Shannon-type 
divergences, by extending its two building blocks: convexity 
and entropy. We have introduced the concept of g-convexity, 
for which we have stated and proved a Jensen q-inequality. 
Based on this concept, we have introduced the Jensen-TsalUs 
q-difference, a nonextensive generalization of the Jensen- 
Shannon divergence. We have characterized the Jensen-Tsallis 
q-difference with respect to convexity and extrema, extending 
previous results obtained in [21], [19] for the Jensen-Shannon 
divergence. 
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Appendix 

In [24], Suyari proposed the following set of axioms (above 
referred as Suyari's axioms) that determine nonextensive en- 
tropies Sq,^ : A"~^ ^ R of the form stated in (|2]l. In what 
follows, q is fixed and fg is a function defined on A"^^. 

(Al) Continuity: fg is continuous in A"~^ and q> 0; 
(A2) Maximality: For any q > 0, n E N, and (pi, . . . ,p„) e 

A"-i, /,(pi,...,p„)</,(l/n,...,l/n); 
(A3) Generalized additivity: For i — 1, . . . ,n, j ~ 1, . . . , rrii, 

Pij > 0, and Pi = J2T=li Pij' 

.fq{Pll,---,Pnr,ii) = fq{pi,...,Pn) + 

'Pil 
Pi 



YpUA-^- 



Pi: 



Pi 



(A4) Expandability: fq{pi, . . . ,_p„, 0) = fq{pi, ■ ■ ■ ,Pn)- 
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